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Context. The AGILE space mission (whose instrument is sensitive in the energy ranges 18-60 keV, and 30 MeV - 50 GeV) has been 
operating since 2007. Assessing the statistical significance of time variability of y-ray sources above 100 MeV is a primary task of 
the AGILE data analysis. In particular, it is important to check the instrument sensitivity in terms of Poisson modeling of the data 
background, and to determine the post-trial confidence of detections. 

Aims. The goals of this work are: (i) evaluating the distributions of the likelihood ratio test for "empty" fields, and for regions of the 
Galactic plane; (ii) calculating the probability of false detection over multiple time intervals. 

Methods. In this paper we describe in detail the techniques used to search for short-term variability in the AGILE y-ray source 
database. We describe the binned maximum likelihood method used for the analysis of AGILE data, and the numerical simulations 
that support the characterization of the statistical analysis. We apply our method to both Galactic and extra-galactic transients, and 
(-H ' provide a few examples. 

Results. After having checked the reliability of the statistical description tested with the real AGILE data, we obtain the distribution 
of p-values for blind and specific source searches. We apply our results to the determination of the post-trial statistical significance of 
detections of transient y-ray sources in terms of pre-trial values. 

Conclusions. The results of our analysis allow a precise determination of the post-trial significance of y-ray sources detected by 
iZ3 ; AGILE. 
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■ 1 . Introduction ray source detections must address the pre- vs. post-trial signifi- 

„, . . . . ^ TT „ , cance. As we will see, the characteristics of the pointing and of 

^ ; The current generation of y-ray space missions, AGILE and ^ ific search (whether a „ blind „ search Qr a search for a 

^ Fermi, are sensitive in the energy range above 100 MeV up ific source) haye a direct influence Qn ^ statistica i 

■ to tens of hundreds of GeV. These missions are providing a treatment In this we address the issue of the statistical 

;H great wealth of data on a variety of y-ray sources, both in our determination of sources for the AGILE mission. 

' Galaxy and at extragalactic distances. Compared with the previ- „. .^ ITr ^ n m- r , ■ ■ , 

r\i ■ r ■ / V-^nr-rn i j r The AGILE-GRID instrument for observations in the y-ray 

v N ■ ous generation of y-ray instruments (e.g., EGRET on board ol , c .„ . . ,„ „ ■ 1 , ~»rvn k 

. £ ^ n <~>u . \ A. a/-^tt a r ■ has an energy range of 30 MeV - 50 GeV Tavam et al. 2009a). 

. . the Compton Gamma-Ray Observatory), the AGILE and Fermi- 

. LAT y-ray imagers are based on silicon detectors with opti- 
mal spatial resolution and much improved background rejection 



AGILE data are down-linked approximately every 100 minutes 

and sent to the AGILE Data Center (ADC), which is part of the 

\^ . m. , * ■ ■ ASI Science Data Center (ASDC) for data reduction, scientific 

rN These characteristics allow reaching a few arcminute position- . , ... . „^„ „ , A ^ TT V j . ^ 

i—i .£. , , D - ,, £ . /T n™ 7 s processing, and archiving. ASDC forwards the AGILE data to 

n-i ing tor intense sources and very large fields of view (FOVs). , . „ TT °' . . . , „ . . T . . . . 
tVA , t, i a^ttt- j t- • t a ^ , , , , j the AGILE Team local sites where a Quick Look analysis is per- 

Both AGILE and Fermi-LAT reach FOVs more than 100 de- . , ^ J F 

formed. 

Because of the low detection rate of events and the extent of 



grees across, i.e., 2.5 sr, and this fact is of great relevance for the 

statistical analysis of the y-ray sources. Based on the current de- ...... 

tector performances, it is crucial to perform a statistical analysis the ^ <? RID Pomt-spread function (PSF), statistical tech- 
specialized to the specific modes of operation of the two y-ray mc l ues llke the maximum likelihood method (or estimator) are 
missions. In particular, the distinction between non-significant rec l ulred to anal y ze the AGILE P olnt s ources. Similar an alysis 
and significant steady/transient sources must be supported by a techniques were used for EGRET data (jMattox et al. 1996|) . 
specific treatment of the satellite y-ray data. The relatively large For stead y sources and sources wlth hl § h T-ray flux, the sig- 
effective areas at y-ray energies, and the very large FOVs, pro- nificance of a detection increases as a function of the observation 
duce large exposures (measured as the product of effective area duration. This is not necessarily the case for variable sources 
times the on-source duration). The statistical significance of y- wlth low flux: a short integration time (the integration period 
should be of the same time scale as the duration of the astro- 
Sent/ offprint requests to: A. Bulgarelli, e-mail: nomical phenomena under study) may yield a low significance 
bulgarelli@iasfbo . inaf . it level because the observation is photon limited; for a longer in- 
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tegration time the source may disappear entirely because the ob- 
servation is background limited. 

The main purpose of this work is the evaluation of the likeli- 
hood ratio test in the context of short time scale (1-2 days) flaring 
y-ray sources. The likelihood ratio test is used to compare two 
ensembles of hypotheses, one in which a y-ray source is present, 
and another (the null hypothesis) in which it is absent. The de- 
termination of the likelihood ratio distribution in the case of the 
null hypothesis is used to evaluate the occurrence of false posi- 
tive detections. 

The search for y-ray transients (Galactic and extra-Galactic) 
detectable on timescales of 1-2 days is one of the major activities 
performed by the AGILE CollaboratiorQ. 

The main goals of this work are: (i) to evaluate the distri- 
bution of the likelihood ratio test statistic (T s ) for empty fields 
both outside of and within the Galactic plane; (ii) to evaluate the 
T s distribution in the case of one flaring source with different 
combinations of parameters; and (iii) to calculate the confidence 
levels corresponding to the same false detection probability in 
multiple non-overlapping time periods. 

We performed Monte Carlo simulations to determine the T s 
distributions both in the presence and absence of a flaring source, 
which may differ from the theoretically predicted distribution for 
a number of reasons which we discuss below. We compare these 
results with the AGILE data where feasible, and show how the 
formulation of the hypotheses influences the T s distribution. 

In Section|2]we present the AGILE -GRID maximum likeli- 
hood analysis method. In Section[3]we report a general descrip- 
tion of the performed Monte Carlo simulations. In Section [4] 
we characterize the T s distribution of a simulated extra-Galactic 
empty field and compare it with a real observation case. Section 
[5] characterizes the T s distribution for two simulated Galactic 
fields, a simple and a complex case; in the latter we analyze the 
effect that uncertainties in the analysis parameters produce on 
the T s distribution. In Section [6] we describe the pre- and post- 
trial probability, and in Section|7]we consider multiple detections 
from the same sky position. 



2. Analysis method 

The likelihood ratio test is used to compare two ensembles of 
models, one of which is a subset of the other, each of which 
can be characterized by a set of parameters. In the most com- 
mon case, one of the ensembles of models is the null hypoth- 
esis, while the other, of which the null hypothesis is a subset, 
is the alternative hypothesis, corresponding to, for example, the 
hypothesis of the existence of a source. In each case, the val- 
ues of the set of parameters are found by means of a maxi- 
mum likelihood method or estimator which maximizes the like- 
lihood of producing the data given the ensemble of models. The 
application of likelihood to photon-counting experiments is de- 
scribed in (ICash 19791) . Details of how the likelihood is cal- 
culated in the con text of y-ray data analysis can be found in 
(Matt oxetal. 19 96). The likelihood ratio is then simply the ra- 



1 Two independent y-ray transient search systems have 
been developed. On e system operates at INAF/IASF-Bologna 
(Bulgarelli et al. 2009); it is able to process the data within 1.5 hours 
(from the last photon acquired in orbit to the alert generation), and to 
generate alerts via e-mail and SMS to the mobile phones of the AGILE 
Collaboration. A second system operates at the ADC (AGILE Data 
Center) in Frascati; it can react within an average time of 3 hours, and 
generates alerts via e-mail but with more accurate data processing. 



tio of these two maximum likelihoods, and the test statistic T s is 
defined as 

T, = -2ln^ (1) 

M 

where Lq and L\ are the maximum value of the likelihood func- 
tion for the null hypothesis and for the alternative hypothesis, 
respectively. 

In the AGILE-GRID case, the data are binned into FITS 
counts maps, while each model is a linear combination of 
isotropic and Galactic diffuse components of the y-ray emission 
and point sources, y-ray exposure maps, and galactic diffuse 
emission maps are used to calculate the models. Among the pa- 
rameters which may be varied to find the maximum likelihood 
are the coefficients of the diffuse and point source components. 

In general, to describe a single point source, four parame- 
ters are used: the predicted source counts s c , the spectral index 
s S i (the spectrum of each source is assumed to be a power-law), 
and two parameters corresponding to the position of the source 
(*;, Sh, in Galactic coordinates). It is possible to keep each pa- 
rameter free or fixed; a free parameter is allowed to vary to find 
the maximum likelihood. Possible combinations include: only 
allowing the flux to vary, allowing both the position and flux to 
vary, allowing both the spectrum and flux to vary, or leaving all 
four parameters free. Throughout this paper we keep the spec- 
tral index fixed to 2.1; for this kind of sources the AGILE-GRID 
instrument has a point spread functions (PSF) at 30° off-axis for 
E>100 MeV of 2.1°, for E>400 MeV of 1.1°, for E>1 GeV of 
0.8°. 

The two parameters that describe the Galactic (diffuse) and 
isotropic y-ray emission are: (i) g ga i, the coefficient of the 
Galactic diffuse emission model, and (ii) g,- J0 , the isotropic dif- 
fuse intensity. A value of g ga i < 1 is expected if the galactic dif- 
fuse emission model is correct. Values of gj S0 between 1 and 15 
x l(T 5 crrr 2 s _1 sr _1 are expected, depending on pointing strategy 
and on board and background filter rejection. These two param- 
eters can be left free or fixed independently. For short timescale 
variability (less then 3-4 days) of y-ray sources, usually we first 
estimate these parameters with a longer timescale integration, 
and then fix them for the short timescale analysis, assuming 
that these components do not vary significantly on the shorter 
timescales. We note that for the AGILE data used in this analy- 
sis (based on the standard filter FM3.1 19), the isotropic compo- 
nent is dominated by instrumental charged particle background 
rather than by the extragalactic diffuse emission, in contrast to 
data from EGRET and Fermi-LAT. 

The values of the parameters which maximize the likelihood 
are those which describe the model in the ensemble most likely 
to reproduce the data. 

The null hypothesis corresponds to the absence of the point 
source, while the alternative hypothesis corresponds to its pres- 
ence. Clearly, the null hypothesis is a subset of the alternative hy- 
pothesis, corresponding to a source with zero flux. The Galactic 
diffuse and isotropic coefficients, as well as the parameters of 
other known point sources in the field of view, must be kept ei- 
ther fixed or free in the same manner when evaluating both the 
null and alternative hypothesis. 

In order to limit the effect of systematic errors far from the 
position of the hypothesized source, the data bins evaluated (and 
their predicted values according to the models) are limited to an 
analysis region of radius 5° or 10° centered around the source 
position. 

From Wilks's Theorem dWilks 1 938) the T s distribution ip is 
expected to asymptotically follow xi- m m tne nu U hypothesis, 



2 



A.Bulgarelli: Evaluating Maximum Likelihood for AGILE Short-Term Variability 



where n - mis the number of additional parameters that are op- 
timized in the alternative hypothesis. In the most simple case, 
n — m = 1 (e.g., in the case of the determination of the flux 
of a single source). This means that from Wilks's theorem, T s 
is expected to be asymptotically distributed as x\ m the null hy- 
pothesis. The expected departure of the distribution from;^ is of 
order (Ny 1 ^ 2 where N is the number of samples. In our context, 
the number of samples is the number of photons which carry 
information about all the parameters; these are all the photons in 
the analysis region. This is true regardless of the number that is 
eventually estimated to come from the point source. 

When there are multiple sources whose fluxes are allowed to 
vary, the following procedure, divided into two loops, is used 
to find the maximum T s . In the first loop, first the sources are 
sorted according to hypothesized flux. One by one, the sources 
are added to the model, from highest to lowest flux. If the source 
flux is allowed to vary, then the maximum likelihood is found 
both in the presence and absence of the source. If the position is 
allowed to vary, the first fit is done at a fixed position and the 
resulting T s is compared to a location confidence level thresh- 
old (tid). If T s > tid, then T s is again maximized with variable 
flux and position. T ss is a threshold for promoting the source to 
the second loop of the algorithm: if the final T s is greater than 
T ss , the source is considered significant and added to both the 
null and alternative hypotheses for the other sources. If not, it is 
considered undetected, and is set to zero flux for all subsequent 
analyses. With the sources over the T ss threshold (with or with- 
out a location confidence level) the second loop is similar to the 
first loop, except that all of the sources marked significant in the 
first loop are contained in the models from the beginning. The 
sources are again evaluated one at a time from highest to lowest 
flux. The T s of each source is again maximized, and set to its 
final value. The values of the parameters f/ c / and T ss affect the 
behavior of the procedure. f/ c / = 5.99147 corresponds to a 95% 
confidence level for two degrees of freedom. 

The maximum likelihood estimator developed for AGILE 
constrains the flux of a source, and therefore the source counts 
s c , to be greater than or equal to zero. Because the ensemble of 
models considered is half of the theoretically possible number, 
the shape of the T s distribution differs from that of Wilks's the- 
orem by being asymptotically distributed as 0.5 x x\ instead of 
x\ dMattox et al. 19961) . 

In order to compare the data distribution of T s produced by 
the AGILE analysis procedure with that predicted by Wilks's 
theorem, we performed a series of Monte Carlo simulations of 
AGILE data. Each simulation of an analysis region and its sub- 
sequent maximum likelihood analysis constitutes a single trial. 
The probability that the result of a trial in an empty field has 
T s > h (that is the complement of the cumulative distribution 
function) is 



P(T S >h)= \ ip(x)dx (2) 

Jh 

This is also called the /9-value p — P(T S > h). This is the 
pre-set (pre-trial) type-1 error (a false positive, rejecting the null 
hypothesis when in fact it is true). Given a statistical distribution, 
a "/j-value" assigned to a given value of a random variable is de- 
fined as the probability of obtaining that value or larger when 
the null hypothesis is true. This value may be interpreted as an 
"occurrence-rate", that is, how many trials occur on average be- 
fore obtaining a false detection at a level equal or greater to h. 



2.1. Hypothesis formulation 

In the context of the y-ray transient analysis, the null hypothe- 
sis is defined as an analysis region containing only steady and 
known sources with no flaring sources present. We can translate 
this into the ensemble of models by keeping the flux of the flar- 
ing source fixed to zero, and the fluxes of steady sources fixed 
to their known fluxes. In the alternative hypothesis that a flar- 
ing source is present, the flux (and position if specified) of this 
source is allowed to be free and the fluxes of steady sources are 
fixed to their known fluxes. In this work, we restrict our analy- 
sis to hypotheses of single flaring sources, neglecting alternative 
hypotheses of 2 or more flaring sources in the same analysis re- 
gion. 

Additional knowledge of the source, e.g. from other wave- 
lengths, can add useful additional constraints about the position 
of the source in the hypothesis formulation. In Section [5] we 
show that this additional knowledge can change the T s distribu- 
tion, and thereby reduce the occurrence rate of false detection. 

For the analysis of a flaring source, we consider two possible 
scenarios: 

1 . The flaring source is unknown: in this case, the position and 
flux parameters are allowed to be free and optimized with 
respect to the input data: the starting (/, b) position is usu- 
ally the counts peak found in the smoothed map. If the T s is 
higher than a well defined threshold, the alternative (flaring 
source) hypothesis is accepted, and a counterpart search may 
be performed. 

2. The source is known and the alternative hypothesis is that 
this source is in flaring state. This scenario can be further 
subdivided into two cases: 

(a) The mean flux of the source is below the background 
level of the sky region, producing only an upper limit 
over long integrations. 

(b) The mean flux of the source is above the background 
level of the sky region and is detected over long integra- 
tions. 

In both sub-cases, two types of analysis are possible: (i) the 
flux parameter is allowed to be free and the position kept 
fixed, (ii) the flux and position parameters are allowed to be 
free. If the position is allowed to be free, the starting position 
is usually the position of the source in steady state. Keeping 
the position fixed implies that the alternative hypothesis be- 
ing tested is that the flare comes from the known source (e.g. 
from the behaviour in other wavelengths or because other 
flares have been detected in the past), whereas allowing the 
position to vary allows the alternative hypothesis to include 
any flaring source within the analysis region. We may then, 
at the same time as we calculate the significance of the de- 
tection, verify whether the confidence contour of the source 
position is compatible with the source hypothesized to be re- 
sponsible for the flare. If the position of the known source 
is outside the confidence region then the alternative hypoth- 
esis can be rejected in the sense that the known source is not 
responsible for the flare. 

3. Monte Carlo simulations 

Monte Carlo simulations of AGILE y-ray data were used 
to characterize the maximum likelihood analysis procedure. 
Simulated data were generated using a model of the background 
(Galactic diffuse radiation model and isotropic background) and 
the AGILE-GRID instrument response functions (version 10023 
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of the calibration matrices for effective area, energy dispersion 
and point spread function). The energy range used is 100 MeV 
- 50 GeV. The simulated observations were generated adding 
Poisson-distributed deviates to each pixel. Each bin of the gen- 
erated maps (counts, exposure and Galactic emission maps) has 
been analyzed exactly as flight data as described in Section [2] 
The bin size chosen for the simulations is 0.25°, the same size 
used in the AGILE-GRID daily monitoring. 

The exposure level chosen for the simulations was a level 
equivalent to a mean value of 1-day pointing/2-day spinning 
AGILE observation mode. 

In TableQ]we report the parameters and the number of trials 
for the performed simulations. 



4. Extra-Galactic empty field 

4.1. Monte Carlo simulation 

We simulated an extra-Galactic empty field without flaring or 
steady sources. The simulation was performed with an AGILE 
field of view of 60°. Figure [T] shows the exposure map, and the 
bins used in this simulation to perform the trials. This is a typ- 
ical 2-day exposure map in spinning mode, but in this context 
the key point is the level of exposure, and not its shape. We an- 
alyzed positions within 50° of the center of the map to exclude 
low values of the exposure, which we also do in every day sky 
monitoring. 

We performed a maximum likelihood analysis at positions 
corresponding to every fifth degree on the map. The spacing 
was chosen to ensure that the analyses would be independent 
from one another. The position is kept fixed and the flux al- 
lowed to vary, implying one additional parameter in the alter- 
native hypothesis. We repeated the counts map simulations and 
analysis for different values of the coefficient of the isotropic dif- 
fuse component (g, JO =6 and 12) consistent with values found in 
real AGILE observations. Figure [2] shows the resulting T s dis- 
tribution (left panel) and the related p-value distribution (right 
panel). We fit this T s distribution with the following function: 

k'(T) = ( 6 tf T s<l (3) 

\ wj/iTs) otherwise 

In Table|2]we report the results of the fit for different background 
gi SO levels. The function S in the first bin takes into account the 
constraint on the source counts (s c > 0) . In this simple case, the 
simulated distribution is close to the expected 1 /2x\ (see the 77 
parameter in Table|2]i. 

In Table [3] we report the results of the fit for different levels 
of exposure. Figure [T] shows the regions with different exposure 
levels. 

Table 2. Best fit parameters in the case of an empty field for 
simulated sky maps with bin size of 0° .25 for different values 
of gi so , the isotropic emission coefficient. The fitting function is 
reported in Equation\3\ 



Siso 


S 


n 


6 


0.8742 ±1.9 • 10" 4 


0.4082 ±2.3 • 10- 4 


12 


0.8681 ±1.5 • 10~ 4 


0.4280 ±2.0 • 10~ 4 



.. ... I 

20 41 61 B2 102 123 143 164 184 

Fig. 1. Extra-Galactic (b — 90°) exposure map used for the sim- 
ulations (in units ofcm 2 ssr) in Galactic coordinates. Notice the 
larger exposure near the celestial pole. The white circles are the 
positions of the trials, the green circles indicate the high and low 
chosen exposure regions. 




5 10 15 20 25 5 10 15 20 25 

TS h 



Fig. 2. T s distribution (left side) and p-distribution (right side) 
of a simulated empty field, with g ga i — 1 and gi so — 6 ( with 
these parameters left free during the analysis), flux free, position 
fixed. The blue crosses are the calculated distribution, the black 
line is the best fit according to Equation\3\ the red dotted line is 
the jx\ theoretical distribution, the green dashed line is the x\ 
theoretical distribution, the Cyan dotted-dashed line is the jx\ 
distribution. 



4.2. Real observation 

We compared the simulated data shown in the last section to 
a real AGILE observation. The observation block chosen is 
OB7410 (see ASI Data Center web site, http://a gTIe.asdc.asi.it/l l, 
in which AGILE was pointed to the North Galactic Pole with 
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Table 1. Parameters and number of trials for the performed extra-Galactic and Galactic field simulations. 



simulation 


source position 


l lcl 


v_> 111 1 LlluLl^u 

( P / £f,-„„ ) 

\5gal ' also/ 


Anfllv7pH 

i \ 1 111 1 V 

p„„. and P;»„ 

agal foiso 


Trials xlO 6 


Extra-Gal. empty field 


fixed 


na 


(1.0,6.0) 


free 


31.0 


Extra-Gal. empty field 


fixed 


na 


(1.0, 12.0) 


free 


35.0 


Gal. field without sources 


fixed 


na 


(1.0,6.0) 


free 


7.0 


Gal. field without sources 


free 


5.99147 


(1.0,6.0) 


fixed 


4.0 


Gal. field with steady sources 


fixed 


na 


(0.63,7.70) 


fixed 


9.0 


Gal. field with steady sources 


free 


5.99147 


(0.63,7.70) 


fixed 


17.0 


Gal. field with steady sources 


free 


2.29575 


(0.63,7.70) 


fixed 


2.2 



Table 3. Best fit parameters in the case of an empty field for 
simulated sky maps with bin size of 0° .25, gi so —6, for two levels 
of exposure. The fitting function is reported in Equation^ 



exposure 


6 


n 


low 


0.8792 ±5.6 ■ 10- 4 


0.3920 ±6.7 ■ 10~ 4 


high 


0.8715 ±5.3 ■ 10~ 4 


0.4172 ±6.6 • 10~ 4 



good exposure. For each day of the observation counts, ex- 
posure and gas maps were generated and analyzed. As in the 
Monte Carlo simulations, a maximum likelihood analysis was 
performed for a hypothetical source position at every fifth de- 
gree to ensure the independence of each trial. The results are 
shown in Figure[3] Taking into account the limits of the statistics 
collected from this real observation, real and simulated data are 
compatible at Icr error level. 




Fig. 3. Comparison between p-value distributions for simulated 
(blue) and real (green) empty extra-Galactic fields. The red dot- 
ted line is the jx\ theoretical distribution, the green dashed line 
is the x\ theoretical distribution, the Cyan dotted-dashed line is 
the distribution. 



5. Monte Carlo simulations of Galactic fields 

We performed simulations of two regions of the Galactic plane: 
a Galactic region with a low density of potential sources, and a 
complex Galactic region (the Cygnus region). These two regions 
represent two extremes for the AGILE analysis. 



5.1. A simple Galactic region 

We performed a simulation of a relatively simple Galactic re- 
gion by assuming only the Galactic diffuse and isotropic emis- 
sions without steady or flaring sources. This calculation was 
aimed at evaluating the photon density function and the p-value 
distribution of the AGILE likelihood maximum estimator in the 
presence of the Galactic diffuse emission. We chose a region 
centered on (l,b)=(160,0) (Galactic coordinates) with 1-day ex- 
posure level in pointing mode. The parameters used in the sim- 
ulation are g g „i = 1 and gi S0 = 3. During the analysis, the spec- 
trum of any hypothetical source is kept fixed. 

In order to analyze the flux of a source whose position is 
known, we fix the position of the source, and allow the flux 
to vary in the alternative hypothesis. The resulting p-value dis- 
tribution is shown as the brown histogram in Figure [4] Using 
Equation [3] we find the best fit with 6 = 0.8600 + 0.0003 and 
j] = 0.4540 + 0.0005, for N\ = I. However, because of the pres- 
ence of systematic errors in the event reconstruction, there are 
cases in which the source is detectable but at a position farther 
from the known position than a purely statistical analysis would 
predict. In order to handle these cases, we kept the position of the 
source free and we have developed an analysis criterion which 
we call ICL (Inside Confidence Contour Level): if the contour 
level is present, T s > ticL (we fix f/cz. = 9 for the AGILE analy- 
sis), and the position of the source under investigation is outside 
the contour level found by the maximum likelihood procedure, 
we reset T s to 0; a contour in principle always exists (even if it 
is not necessarily closed or connected), but sometimes our soft- 
ware fails to find it. The contour is always searched for at f/ c ;. 

We use the ICL criterion only in the case of the presence 
of a known source. The T s distribution presented hereafter with 
position free and without ICL criterion are related to the analysis 
of an unknown source. The reason for throwing out the event is 
that the technique is used to weed out detections which we are 
not sure are coincident with the source. 

We compare the results with the ICL criterion with the stan- 
dard analysis (position left free). The p-value distributions are 
reported in Figure [4] (the red histogram for the ICL criterion, 
the blue histogram without the ICL criterion, both with T ss = 4, 
tid = 5.99147). The blue and red histograms show a pronounced 
dip just above 7/ c / with respect to the brown histogram because 
sources with T s > T\ c i may increase their T s during relocaliza- 
tion (shift to the right). We characterize the T s distribution pro- 
duced by the addition of the ICL criterion because it is used by 
the AGILE automated quick-look analysis when searching for 
flares from known sources. 

Figure [5] reports the T s distribution when the null hypothe- 
sis is true produced by maximum likelihood analysis with the 
following parameters: T ss = 4, f/ c ; = 5.99147, flux and posi- 
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Fig. 4. The effect of the hypothesis formulation ( trial selec- 
tion). The p-value distribution depends on the constraint on the 
position of the source: comparison between different analysis 
methods ( in particular, keeping fixed and leaving free the po- 
sition of the source) in the case of absence of a source at the 
(l,b)—( 160,0)° location. Histograms are the p-value distribution 
for an empty Galactic field when the null hypothesis is true. Blue 
and red histograms have the following parameters: T ss — 4, 
tid — 5.99147, flux and position of the source left free. The 
blue histogram contains all trials regardless of the calculated 
position, while the red histogram contains the trials that respect 
the ICL criterion. The brown histogram has the position of the 
source kept fixed and flux free. The red dotted line is the ^x\ the- 
oretical distribution, the green dashed line is the x\ theoretical 
distribution, the cyan dotted-dashed line is the \x\ distribution. 




Fig. 5. The blue histogram is the T s distribution for the empty 
Galactic region when the null hypothesis for a source at the po- 
sition (l,b)=( 160,0)° is true. T ss - 4, t tct = 5.99147, flux and 
position of the source left free. Black line is the best fit func- 
tion described in Equation @ iV2 =5. The red dotted line is 
the jx\ theoretical distribution, the green dashed line is the x\ 
theoretical distribution, the cyan dotted-dashed line is the \x\ 
distribution. 



tion of a source at (l,b)=(160,0)° position left free. The related 
/j-value distribution has been already shown in Figure [4] (blue 
histogram). The ICL criterion is not applied, i.e. no rejection is 
applied on the basis of the compatibility of the location contour 
with the position of the source. Fitting this distribution with the 
following function, 



{ mx 2 N AT s - 



tied 



if T s < 1 

if T s > 1 and T s 

otherwise 



^ tlcl 



(4) 



we find N\ = 1 (if T s < T ss , no optimization of the position 
takes place and therefore the only free parameter is the flux of 
the source), N 2 = 5, 6 = 0.89 ± 4.5 x 1(T 4 , tj x = 0.35 ± 5.1 x 
10~ 4 and T] 2 = 3.96 x 10~ 3 + 3 x 10~ 5 . We use functions with 
N=5 (dof) solely as an analytical approximation to the functional 
form produced by this process. The translation (T s - tid) is due to 
the switch between the fixed and free position regime (see Sect. 
0. 

Equation |4] is appropriate for blind searches for unknown 
sources. When searching for flares from a known source (i.e. 
our hypothesis is that there is a known source at (l,b)=(160,0)° 
position), the appropriate alternative hypothesis should exclude 
sources for which the known source position lies outside the lo- 
cation contour. Therefore, applying the ICL rejection criterion 
at the 95% confidence level, the resulting p-value distribution is 
reported in Figure|4](red line) compared with the blue histogram 
of the same Figure in which no selection criterion is applied. The 
effect of appropriate hypothesis formulation is evident. The ap- 
plication of the ICL rejection reduces the number of degrees of 
freedom. When we fit this distribution with Equation [4] we find 
that the histogram has a distribution between N2 = 3 and N2 = 4, 
due to the ICL selection criterion. 

The following is the analytical expression that can be used 
with tjcL - 9 where T3 — 14: 



k"'(T s ) 



if T s < 1 

if T s > 1 and T s < tid 



mX Nl ( T s - tid) if T s > t k i and T s < T ICL 



mxl 3 (T s ) 



(5) 



if T s > Ticl and T s < T 3 
ifT s > T 3 



for the following values for the parameters: N\ = 1, N2 = 5, 
N 3 =5,Ni = l,S = 0.89 ± 4.4 x 10~ 4 , 771 = 0.34 + 5.1 x 10~ 4 , 
772 = 4.5 x 10~ 3 ± 5.7 x 10~ 5 , T]j = 1.27 x 10~ 2 ± 1.7 x 10~ 4 , 
774 = 0.91 ± 3.3 x 10~ 2 . This expression approximates the ex- 
pected behavior of the analysis, which shifts gradually from a 
source location algorithm with many free parameters near the 
threshold where the location contour is large, to an analysis more 
similar to a fixed-position analysis at high T s where the location 
contour is small. Figure [6] reports the T s distribution when the 
null hypothesis is true produced by maximum likelihood analy- 
sis with the following parameters: T ss = 4, f; c / = 5.99147, flux 
and position left free and ICL criterion; red histogram is of a 
region centered at (l,b)=(160,0)°, blue histogram is related to 
Cygnus region. The black line is the best fit as reported in Eqn. 
[5] The reported distributions correspond to the standard quick- 
look analysis of AGILE data for empty Galactic regions. We 
notice that the changes in the selection criterion modify the 
expected p-values with respect to the ^x\ theoretical expected 
distribution (cyan dashed line). 

The analysis with the position of the source kept fixed yields 
lower p-values than the analysis with the position allowed to 
vary, either with the ICL criterion (red histogram of Figure |4]i 
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Fig. 6. The effect of steady sources in the ensemble of models. 
The blue line is the PDF for Cy gnus field when the null hypoth- 
esis for Cygnus X-3 is true, and the red line is the PDF for an 
empty Galactic field when the null hypothesis is true, with the 
following parameters: T ss — 4, tyi — 5.99147, flux and position 
of source in the alternative hypothesis left free, g ga i and gi so pa- 
rameters fixed, ICL criterion. Black line is the best fit function 
described in Equation \5\ The red dotted line is the \x\ theo- 
retical distribution, the green dashed line is the x\ theoretical 
distribution, the Cyan dotted-dashed line is the distribution. 



or without (blue histogram). However, the fraction of detections 
above a given T$ threshold in the presence of a real source is also 
lower when the source position is kept fixed, as shown by the 
histograms in Figure[7] Table |4]reports the number of detections 
(in %) for some T s thresholds. 

5.2. A complex Galactic region: the case of the Cygnus field 

We simulated observations of the Cygnus region both including 
and without a source at the position of Cygnus X-3 to test the 
analysis procedure in a complex case with nearby point sources 
and Galactic diffuse emission. We have chosen this field because 
this is one of the most complex cases that our analysis proce- 
dure must address. Cygnus X-3 is a well-known microquasar 
dGiacconi et al. 1967ft . showing variable emission at all wave- 
lengths, including repeat ed y-ray flaring act ivity above 100 MeV 
as detected by AGILE dTavani et al. 200 9b). This case has been 
chosen because it shows a great variability in the y-ray energy 
range and a high correlation with other wavelengths. The list 
of simulated sources of the Cygnus region is reported in Table 
[5] In Figure [8] we show a 0.5 year integration of AGILE data 
from the Cygnus region and a simulation of a comparable in- 
tegration using the same parameters used to simulate the short 
trials, demonstrating that the underlying model is sound. 

The null hypothesis is that no y-ray source coincident with 
Cygnus X-3 is present in the AGILE data, while the alternative 
hypothesis is that a source coincident with Cygnus X-3 is emit- 
ting y-rays. The parameters of the other sources and the diffuse 
emission coefficients were all kept fixed. 

If we fix the position of the source as already described in 
the previous section, using Equation [3] we find the best fit with 
5 = 0.65±3.6xl0- 4 ,77 = 4.6xl0- 1 ±4.2xl0- 4 ,A? I = 1. The two 
77 parameters of the cases of empty and complex Galactic fields 




Fig. 7. The effect of the hypothesis formulation ( trial selection). 
The histograms show the p-value distributions in the presence 
of a simulated source at the location of (l,b)—( 160,0)° with flux 
180 X 10~ 8 photons cm~ 2 s~ l . Red histograms have the follow- 
ing parameters: T ss — 4, t[ c i — 5.99147, flux and position of 
the source left free with ICL rejection criterion. Blue histograms 
have tid — 5. 99147, flux and position of the source left free with- 
out ICL rejection criterion. Brown histograms have the flux left 
free and the position of the source kept fixed. 





Fig. 8. 77ie binned counts map of the real (left side) and sim- 
ulated (right side) Cygnus field for an integration time of 0.5 
years (AGILE counts, E > 100 MeV). The real data is taken from 
July 2007 to October 2009; the map is centered on (l,b)—(78. 75, 
0)° in Galactic coordinates with a bin size of 0.1°. 

are very similar. Therefore, in the Tables [6] and [8] we report a 
single value for the fixed position analysis which is valid for 
both cases. 

Keeping the position of Cygnus X-3 free and fitting with 
EquationHwe find that Ni = l,N 2 = 5,S= 0.85 ± 4.6 x 10~ 4 , 
771 = 0.46 + 6.2 x 10~ 5 and t/ 2 = 6.15 X 10~ 3 + 0.4 x 10~ 5 . This 
fitting is appropriate for blind searches for unknown sources in 
complex Galactic regions. 

Keeping the position of Cygnus X-3 free with ICL criterion 
and fitting with Equation[5]we find that N\ = I, N2 = 5, N3 =5, 
N 4 = 1, 8 = 0.84 ± 2 x 10~ 4 , 77! = 0.49 ± 2.8 x 10~ 4 , 772 = 
6.7 x 10~ 3 ± 3.2 x 10~ 5 , 773 = 1.8 x 10~ 2 ± 1.0 x 10~ 5 , 774 = 
1.32 + 1.9 x 10~ 2 . This expression approximates the expected 
behavior of the analysis, as already established in the case of an 
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Table 4. % of detections for some T s thresholds (with related p-values when the null hypothesis is true) when the null hypothesis is 
false with a source at the (l,b)—( 160,0)° position with a simulated flux of 180 X 1CT 8 photons cmT 1 s~ x . 





Fixed position 




Free position 95% 




Free position 95% + ICL 




T s > 


% detection 


p-value 


% detection 


p-value 


% detection 


p- value 


12 


18.1 


2.48 x 1(T 4 


31.3 


1.21 x lfr 3 


22.3 


4.44 x 10"* 


16 


6.2 


2.95 x irr 5 


16.1 


2.98 x lfr 4 


10.4 


5.81 x 10- 5 


25 


0.7 


2.68 x irr 7 


2.2 


7.61 x irr 6 


1.3 


5.26 x 10~ 7 



Fixed position: source position fixed, flux free, T ss = 4, g gal and g is0 parameters fixed. 

Free position 95%: source position free, flux free, f/ c / = 5.99147, T ss = 4, g ga i and g im parameters fixed. 

Free position 95% + ICL: source position free, flux free, f fc; = 5.99147, T ss = 4, g ga i and g im parameters fixed, ICL rejection. 



Table 5. List of Cygnus region sources for E > 100 MeV. 











E> 100 MeV 








AGILE Name 


1 




b 




Flux 




Counterpart name 


AGL 2021+4029 


78 


.24 


2.16 


42.1 


141 : 


t 4 


Gamma Cygni 


AGL 2021+3652 


75 


.24 


0.14 


23.3 


67 + 


3 


PSRJ2021+3651 


AGL 2030+4129 


80 


.11 


1.25 


8.1 


18 ± 


3 


LATPSR J2032+4127 


AGL 2026+3346 


73 


.28 


-2.49 


6.8 


10 ± 


1.7 




AGL 2046+5032 


88 


.99 


4.54 


6.5 


10 ± 


1.7 




AGL 2016+3644 


74 


.59 


0.83 


6.3 


14 ± 


2.3 




AGL 2029+4403 


81 


.97 


3.04 


5.4 


14 ± 


3 




AGL 2038+4313 


82 


.32 


1.18 


5.1 


15 + 


3 




AGL 2024+4027 


78 


.56 


1.63 


5.0 


24 ± 


5 




AGL 2019+3816 


76 


.24 


1.14 


4.2 


11 ± 


2.4 




AGL 2036+3954 


79 


.47 


-0.56 


3.4 


5.0 + 


: 1.5 





The table provides: (1) AGILE name of the sources; (2) (3) the galactic coordinates / and b; (4) the statistical significance yTJ of the source detec- 
tion according to the maximum likelihood ratio test for E>100 MeV; (5) the period-averaged flux F (E > 100 MeV) in 10~ 8 photons cm' 2 s'for 
E>100 MeV; (6) a possible counterpart. We added more sources in the y-ray model compared with the First AGILE Catalog dPittori et al. 2 009 ) 
to take into account a new background event filter (FM3. 119). 

mum likelihood estimator can be used as a source finder, instead 
of an hypothesis validator, by setting f; c ; = 0. In this case the 
optimization of the position is performed regardless of the T s 
found in the first step with fixed position. The resulting p-value 
distributions are shown in Figure|9](with tid = shown in green) 
including all trials without ICL rejection. As expected, higher t\ c \ 
values correspond to lower p-values because the optimization of 
the position starts for higher T s values in the first loop of the 
maximum likelihood procedure. 



5.3.2. The effect of the radius of analysis 

Figure[l0]shows that no appreciable differences are produced by 
changing the radius of analysis. The comparison was performed 
for the case of t\ c \ = applying the ICL criterion, but should also 
be valid for analyses using the other criteria. 



5.3.3. The effect of of keeping g gcd and g iso free or fixed 

In Figure [TT] we show the effect of keeping g ga i and gi so parame- 
ters fixed (blue) and free (red): the p-values with free parameters 
are larger. This result is expected because fixing these parame- 
ters reduces the range of possible hypotheses explored. The com- 
parison was performed for the case of ?/ c / = applying the ICL 
criterion, but should also be valid for analyses using the other 
criteria. 



empty Galactic field (see Section l5Tt . The reported distributions 
correspond to the standard quick-look analysis of AGILE data 
for complex Galactic regions. 

Figure [6] compares the probability density function for the 
Cygnus field (see Table [5} with this empty Galactic field. As 
expected, the effect of a more complex region is an increase in 
the number of false detections. 

With the performed Monte Carlo simulation we have de- 
termined the p-value function for the most common hypothesis 
formulations. Based on these simulation we are able to establish 
the T s level for each p-value and constrain the false occurrence 
rate. Table |6]reports the correspondence between p-value and T s 
value for different methods of analysis, in addition to the theo- 
retical reference forx\, 1 /2x\ and 1 /2x\. 

5.3. Deviation from the nominal distribution 

In the following we report the effect that uncertainties in the 
analysis parameters produce on the shape of the distributions. 
The analyses were performed for the case of the complex 
Galactic region. 

5.3.1 . The effect of the t M parameter 

We performed additional maximum likelihood analyses using 
different values of the f; c ; parameter, including the case of tu-i = 
0. In the case of a blind search for unknown sources, the maxi- 
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Table 6. Correspondence between p-value and T s value for different methods of analysis. The first column reports the p-value, the 
following columns report the corresponding T s value. In particular, the fifth column reports the corresponding T s value for Galactic 
regions with the position of the source of the alternative hypothesis kept fixed, the sixth and last columns report the corresponding 
T s value for Galactic regions (first number for empty regions, second number for complex regions) with the position of the source 
of the alternative hypothesis kept free. 



p-value 


x\ 


1/2*? 




T 

1 s 

Fixed position 


Free position 95% 


Free position 95% + ICL 


irr 2 


6.63 


5.41 


9.84 


5.29 


4.78-5.27 


4.78-5.41 


10- 3 


10.83 


9.55 


14.80 


9.42 


12.60-13.90 


9.89-10.84 


ltr 4 


15.14 


13.83 


19.66 


13.70 


18.81-19.91 


14.97-15.67 


itr 5 


19.52 


18.19 


24.47 


18.06 


24.36-25.40 


19.35-20.05 


ltr 6 


23.93 


22.60 


29.23 


22.46 


29.66-30.66 


23.76-24.47 


irr 7 


28.37 


27.03 


33.98 


26.90 


34.81-35.79 


28.21-28.92 


itr 8 


32.83 


31.49 


38.71 


31.36 


39.87-40.84 


32.67-33.39 


io-" 


37.32 


35.97 


43.42 


35.84 


44.87-45.82 


37.16-37.87 



Fixed position: source position fixed, flux free, T ss = 4, g gal and g is0 parameters fixed. 

Free position 95%: source position free, flux free, f/ c / = 5.99147, T ss = 4, g ga i and g im parameters fixed. 

Free position 95% + ICL: source position free, flux free, t ld = 5.99147, T ss = 4, g gn; and g iso parameters fixed, ICL rejection. 




Fig. 9. The p-value distributions change as a function of the 
tid parameter. Simulations of the Cygnus region with no source 
present at Cygnus X-3 position were analyzed with different f/ e / 
values without applying the ICL rejection criterion. The flux 
and position of the hypothetical source at the Cygnus X-3 were 
allowed to vary, the g ga [ and gi so parameters were kept fixed, 
and T ss — 4. Green histogram: t\ c i — 0; black histogram: 
tid — 2.29575 (corresponds to a 68% confidence level for two 
degrees of freedom); blue histogram: tid — 5.99147. The red 
dotted line is the theoretical distribution, the green dashed 
line is the x\ theoretical distribution, and the cyan dotted-dashed 
line is \x\ distribution. 




Fig. 10. Comparison between different radii of analysis. The his- 
tograms are the p-value distributions for the Cygnus field when 
the null hypothesis for Cygnus X-3 is true with the following 
parameters: T ss — A, tid — 0, flux and position of Cygnus X-3 
left free, ICL rejection applied. Red histogram: radius of analysis 
— 10°; blue histogram: radius of analysis — 5°. The red dotted 
line is the \x\ theoretical distribution, the green dashed line is 
the x\ theoretical distribution, and the cyan dotted-dashed line 
is the \x\ distribution. 



5.3.4. The effect of unmodeled point sources 

We performed a Monte Carlo simulation in which the simulated 
data contain all of the sources listed in Table |5j followed by 
a maximum likelihood analysis with models containing only a 
point source at the location of Cygnus X-3, in order to evaluate 
the effect of nearby unmodeled point sources. The resulting p- 
value distribution is shown in Figure[12] where the red histogram 



is the p-value distribution with the analysis models containing 
all the sources used in the simulation and the black histogram is 
the ^-distribution with only the source at the Cygnus X-3 loca- 
tion. This illustrates that it is critical to model existing sources 
correctly. 
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h 

Fig. 11. Comparison between g ga i and gi so parameters free or 
fixed. The blue histogram is the p- distribution for the Cygnus 
field when the null hypothesis for Cygnus X-3 is true with the 
following parameters: T ss — 4, f; c ; = 0, flux and position of 
Cygnus X-3 left free, g m i and gi SO parameters fixed. The red 
histogram has the same parameters but with g ga i and gi so pa- 
rameters left free. The red dotted line is the jAf \ theoretical dis- 
tribution, the green dashed line is the x\ theoretical distribution, 
the cyan dotted-dashed line is the ^x\ distribution. 




h 



Fig. 12. The effect of unmodeled sources. The histograms are 
the p-value distribution for the Cygnus field when the null hy- 
pothesis for Cygnus X-3 is true with the following parameters: 
T S s — 4, titf — 5.99147, flux and position of Cygnus X-3 left 
free, g ga i and gi so parameters fixed. The red histogram shows the 
result when all the simulated sources are included in the mod- 
els, while the black histogram shows the result when only the 
source at Cygnus X-3 position is included. The red dotted line is 
the \x\ theoretical distribution, the green dashed line is the x\ 
theoretical distribution, the cyan dotted-dashed line is the \x\ 
distribution. 



5.3.5. The effect of errors in the diffuse emission: estimating 
using g ga , 

We performed a preliminary investigation of the effect of sys- 
tematic errors in the diffuse emission model on the analysis re- 
sults by fixing the g ga / parameter during the analysis at a value 
g ga i = 0.67 different from the one used in simulating the data; 
giso = 7.7 is simulated but the gi so parameter is left free to adapt 
during the analysis. Figure [13] reports the p- value distributions 
for the Cygnus field when the null hypothesis for Cygnus X- 
3 is true where the results of three analyses are compared; one 
with simulated g ga i = 0.67 (blue thick lines), one with simu- 
lated g ga i = 0.67 * 1.1 (red thick lines) and one with simulated 
ggai = 0.67*0.89 (black thick lines). Table|7]reports the resulting 
calculated g iso . 

Table 7. 77ie mean value of the calculated gj S0 parameter 



Simulations 


Analysis 


Sgal Siso 


g ga i fixed g im calculated 


0.67 * 0.89 7.7 
0.67 7.7 
0.67*1.1 7.7 


0.67 6.55 ± 1.46 
0.67 7.55 ±1.52 
0.67 8.52 ±1.59 



As expected, gt so moves up if g ga i is too small and vice versa. 
If ggai is under-estimated, then the number of false detections 
when the source is absent increases (see the red lines of Figure 
[T3l . because background photons are assigned to the source. If 
the gg a i parameter is over-estimated, then the number of false 
detections when the source is absent decrease (see black line of 
Figure [13) because the diffuse model is already too high at the 
position of Cygnus X-3. 

6. Pre-trials and post-trials significance 

We have seen how the Monte Carlo simulations can be used to 
characterize the T s distributions produced by the AGILE-GRID 
maximum likelihood analysis procedure. In the end, we find 
the probability, or p-value, of finding a false positive detection 
(rejecting the null hypothesis when it is true) in a single obser- 
vation. 

In practice, for each region of the sky we perform many trials 
during the daily monitoring to search for transient y-ray events. 
The probability of obtaining a single false detection over a large 
number of trials is therefore much higher than p. In the AGILE 
context we perform two kinds of analysis for each analyzed map: 

1. blind search for unknown sources: we search for more than 
one source at a time with free positions; 

2. searches from a list of known sources: we search for more 
than one source at a time with fixed source positions. 

Let K - M ■ N the number of independent trials, where N is 
the number of maps and M is the number of unknown sources 
in the first case or the number of sources in the predefined list in 
the second case. If we have only one source in both cases M — 1 . 

Since the probability of not making a false positive error in a 
single trial is 1 - p, the probability of not making any false posi- 
tive error is (1— p) K (type-I error), so the probability of making at 
least one false positive error is7r = 1 — (1 - p) K . This is defined 
as the post-trial probability, also referred to as the experiment- 
wide error rate, while p is denoted as the pre-trial probability, or 
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Fig. 13. The effect of a poor estimation of the Galactic diffuse 
emission. The histograms are the p- value distributions for the 
Cygnus field when the null hypothesis for Cygnus X-3 is true 
with the following parameters: T ss — 4, f/ c / = 5.99147, flux and 
position of Cygnus X-3 left free. The true value of gi so in the 
simulated data is 1.1, the value of g ga i used in the analysis is 
fixed to 0.67. Blue thin line: simulated g ga i — 0.67, g !SO fixed to 
7.7; blue thick line: simulated g ga i — 0.67, gj SO left free; red thick 
line: simulated g ga i — 0.67 * 1.1, gi so left free; black thick line: 
simulated g ga i — 0.67 * 0.89, gi so left free. The red dotted line is 
the jx\ theoretical distribution, the green dashed line is the x\ 
theoretical distribution, the cyan dotted-dashed line is the \\\ 
distribution. 



comparison-wise error rate. For an experiment-wide false pos- 
itive rate of n, we can constrain the comparison-wise error rate 
with the Dune-Sidak correction p < 1 — (1 — n) ' K . 

Let us consider a typical AGILE case in the context of 
Galactic y-ray transients with the position of the source left free 
and with a single source with flux and position free for each map 
(M = 1): usually we keep fixed the position and flux of known 
sources assuming that only one source is in a flaring state in our 
map (we can reduce the size of the map to accomplish this). If we 
search for a single flaring source once every two days (N — 182 
for 1 year of observations in AGILE spinning mode), if we can 
accept one false detection during the year n < 1 /N, this implies 
threshold of T$ — 11.9. If we can accept a false detection once 
every 2 years, the threshold is T$ - 20.5. 

Table [8] reports the post-trial significance expressed in 
Gaussian standard deviations for some values of K. 



7. Probability of sky-position coincidence in 
non-overlapping time intervals 

In this section we generalize the analysis performed in the pre- 
vious sections to calculate the probability of two or more de- 
tections of a flaring source performed in the same sky position 
in different independent time intervals. Multiple detections of a 
source can have a low probability of being consistent with the 
null hypothesis even when the individual detections are at a low 
level of T s . In order to assess the statistical significance of our 
detections, we consider the post-trial probability of flare occur- 
rence. We distinguish two cases: 



1 . the case of a single flare episode originating from a specific 
source within a given error box (that we define as "single 
independent occurrence" or "single post trial occurrence"); 

2. the case of repeated flaring episodes originating from a spe- 
cific source with a given error box (that we call here "re- 
peated post-trial flare occurrence". 

For each individual AGILE detection, we can calculate the post- 
trial significance of the single independent occurrences, which 
does not take into account the history of repeated occurrences. 

We can then combine the history of the sky region and es- 
tablish the probability of repeated flaring episodes from the same 
sky position. We calculate the post-trial significance for repeated 
flare occurrences at the source error-box position as follows. 

If we perform one trial for each map (M = 1, we use 
a list with only one source; this means that each independent 
time period is a single trial), the chance probability of hav- 
ing k or more detections over N maps at a specific site with 
a T s statistic satisfying T s > h is given by P(N,X > k) = 

1 - E/=o f j P J (1 ~ P) N ^ where p — p(h) is the p-value cor- 
responding to the h value given by Equation [2] P(N, X - j) = 




pj(l - p) N ' is the probability of exactly j detections in N 



maps and P(N, X < k) = y J jp J (l - p) N ~ j is the probabil- 
ity of fewer than k detections at a specific position in N maps. 

If we perform M trials in different positions of N maps, 
where M is the number of known or unknown sources in a pre- 
defined list, the chance of having k or more detections above the 
level h in any of the sites with a T s statistic satisfying T s > h is 

given by P M {N,X>k) = \- |j£J f ^ ) pK 1 - pf~^\ where 

P M (N,X >k) = l- P(N,X < k) M = 1 - (1 - P(N,X > k)) M . 

The choice of p(h) depends on our assumptions: if the flare 
comes from a known source we use Equation |5j if the flare 
comes from a previously unknown source we use Equation [4] 
(the case without ICL criterion). 

Let us consider a simple case in the context of Galactic y-ray 
transients, using the p-value function in the case of a complex 
Galactic region with the position of a single source left free. If 
we detect one flare per year at a specific position of the Galactic 
plane with T s > 16, and we produce 182 maps per year (once 
every two days with an integration time of two days), then after 
the first year the global post-trial significance is 2. 16 cr, after the 
second year it is 3.31 cr, and after the third year it is 4.16 cr. 
Transient sources, as long as they recur, enable us to have more 
confidence in their detection as integration time increases. 

It might seem that this approach adds a bias to the global 
significance of a detection of a flaring source, because it may 
happen that some flares from nearby sources can be "counted" 
together due to the extension of the 95% contour confidence 
level. In doubtful cases only a more detailed analysis can ex- 
clude this. 



8. Conclusions 

We have performed extensive Monte Carlo simulations to char- 
acterize the maximum likelihood ratio test for the AGILE-GRID 
instrument in the context of short timescale (1-2 days) flaring 
y-ray sources, both in extra-Galactic and Galactic fields. In the 
case of Galactic fields, we have simulated both a simple (without 



11 
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Table 8. Post-trial significance expressed in Gaussian standard deviations (o~). The first column reports the pre-trial significance, 
the second column reports the corresponding p-value, the third column reports the corresponding T s value for Galactic regions 
with the position of the source of the alternative hypothesis kept fixed, the fourth and fifth columns report the corresponding T s 
value for Galactic regions (first number for empty regions, second number for complex regions) with the position of the source of 
the alternative hypothesis left free, the last two columns report the post-trial significance for K-180 and K-360 trials. 



a pre-trial 


p-value 


T 






<x post-trial 








Fixed position 


Free position 95% 


Free position 95% + ICL 


K=180 


K=360 


3 


1.35 x 1CT 3 


8.88 


11.66-13.03 


9.08-10.05 


0.78 


0.29 


4 


3.17 x 1(T 5 


15.87 


21.63-22.69 


17.14-17.85 


2.53 


2.28 


5 


2.86 x 1(T 7 


24.87 


32.47-33.46 


26.17-26.88 


3.88 


3.71 


6 


9.21 x lfr 10 


36.00 


45.05-46.00 


37.31-38.03 


5.10 


4.97 



Fixed position: source position fixed, flux free, T ss = 4, g ga i and g jso parameters fixed. 

Free position 95%: source position free, flux free, t IcI = 5.99147, T ss = 4, g ga/ and g iso parameters fixed. 

Free position 95% + ICL: source position free, flux free, t k i = 5.99147, T ss = 4, g ga i and g iso parameters fixed, ICL rejection. 



steady sources) and a complex Galactic region. With these sim- 
ulations we have calibrated both the T s distributions (pre-trial 
significance) and the related false occurrence rate. 

After the introduction of the post-trial significance, we cal- 
culated the post-trial probabilities for single and multiple oc- 
currences. In particular, we calculated the probability of two or 
more detections of a flaring source at the same sky position in 
different independent time intervals. With this approach, we take 
into account the presence of many flaring episodes originating 
from the same sky region, combining its history and adding in- 
formation not present in the single episode and in the post-trial 
evaluation. We call this "repeated post-trial flare occurrence". 

In this paper we have provided a method for converting the 
T s produced by any of the various methods made available by 
the AGILE analysis software into a probability. This information 
can be used by anyone who performs analysis on GRID data 
through the AGILE Guest Observer Program. 
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